AITopics | actor-critic policy optimization

Collaborating Authors

actor-critic policy optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Neural Information Processing SystemsSep-29-2025, 23:12:01 GMT

Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence. Among the most common approaches are algorithms based on gradient ascent of a score function representing discounted return. In this paper, we examine the role of these policy gradient and actor-critic algorithms in partially-observable multiagent environments. We show several candidate policy update rules and relate them to a foundation of regret minimization and multiagent learning techniques for the one-shot and tabular cases, leading to previously unknown convergence guarantees. We apply our method to model-free multiagent reinforcement learning in adversarial sequential decision problems (zero-sum imperfect information games), using RL-style function approximation. We evaluate on commonly used benchmark Poker domains, showing performance against fixed policies and empirical convergence to approximate Nash equilibria in self-play with rates similar to or better than a baseline model-free algorithm for zero-sum games, without any domain-specific state space reductions.

actor-critic policy optimization, name change, observable multiagent environment, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reviews: Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Neural Information Processing SystemsOct-8-2024, 07:37:52 GMT

Specifically, it shows the connection by defining a new variant of an actor-critic algorithm that performs an exhaustive policy evaluation at each stage (denoted as policy-iteration-actor-critic), together with an adaptive learning rate. Then, under this setting, it is said that the actor-critic algorithm basically minimizes regret and converges to a Nash equilibrium. The paper suggests a few new versions of policy gradient update rules (Q-based Policy Gradient, Regret Policy Gradient, and Regret Matching Policy Gradient) and evaluates them on multi-agent zero-sum imperfect information games. To my understanding, Q-Based Policy Gradient is basically an advantage actor-critic algorithm (up to a transformation of the learned baseline) 3. The authors mention a "reasonable parameter sweep" over the hyperparameters. I'm curious to know the stability of the proposed actor-critic algorithms over the different trials 4. The paper should be proofread again.

actor-critic algorithm, observable multiagent environment, policy gradient, (9 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)

Add feedback

Actor-Critic Policy Optimization in Partially Observable Multiagent Environments

Srinivasan, Sriram, Lanctot, Marc, Zambaldi, Vinicius, Perolat, Julien, Tuyls, Karl, Munos, Remi, Bowling, Michael

Neural Information Processing SystemsFeb-14-2020, 12:26:45 GMT

actor-critic policy optimization, algorithm, observable multiagent environment, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback